feat: replace SCAN with sorted set index for time-based schedule lookups#121
Open
tmkarthi wants to merge 3 commits intotaskiq-python:mainfrom
Open
feat: replace SCAN with sorted set index for time-based schedule lookups#121tmkarthi wants to merge 3 commits intotaskiq-python:mainfrom
tmkarthi wants to merge 3 commits intotaskiq-python:mainfrom
Conversation
- Introduced `populate_time_index` parameter to backfill the time index from existing keys. - Updated `startup` method to populate the time index if `populate_time_index` is set to True. - Modified schedule addition and deletion to manage the time index sorted set. - Added tests to verify time index population and cleanup behavior.
- Added `_maybe_cleanup_time_index` method to manage time index cleanup at most once per minute. - Introduced `_cleanup_time_index` method to remove stale entries older than one hour with empty time key lists. - Updated `delete_schedule` to call `_maybe_cleanup_time_index` for efficient cleanup. - Enhanced tests to verify the behavior of the new cleanup methods, ensuring proper handling of stale and recent entries.
…ameter - Updated the _get_previous_time_schedules method to take current_time as an argument, allowing for more precise cutoff calculations. - Adjusted the logic to use the provided current_time for determining previous schedules, ensuring no overlap with the current window. - Modified the call to _get_previous_time_schedules in the first run logic to pass the current_time parameter.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
redis.scan_iter()in_get_previous_time_schedules()withZRANGEBYSCOREon a sorted set time index ({prefix}:time_index), reducing lookup complexity from O(N) over the entire Redis keyspace to O(log(K) + M) where K is thenumber of time entries and M is matching results
self._is_first_runguard inget_schedules()— without it,_get_previous_time_schedules()runs on every scheduler tick, not just the firstadd_scheduleanddelete_scheduleat the same minutecurrent_timeparameter in_get_previous_time_schedules()to prevent window overlap with the caller's already-captured timestampProblem
The scheduler calls
get_schedules()every N seconds (default 60). On the first run,_get_previous_time_schedules()usesredis.scan_iter("{prefix}:time:*")to find past schedules.SCANiterates over every key in the Redis databaseand pattern-matches — if the Redis instance has millions of keys (result backends, broker streams, caches), this becomes extremely slow and can overwhelm Redis.
Additionally,
get_schedules()was missing the_is_first_runcheck whenskip_past_schedules=False, causing the expensive SCAN to execute repeatedly instead of just once.Solution
Maintain a Redis sorted set (
{prefix}:time_index) as a secondary index. Scores are UTC timestamps (truncated to the minute).add_schedule()does aZADDalongside the existingRPUSH, and lookups useZRANGEBYSCOREinstead ofSCAN.A
populate_time_indexconstructor parameter (defaultFalse) enables a one-time SCAN on startup to backfill the index from existing{prefix}:time:*keys, for migrating from older versions.Test plan
SCANis not called during normal scheduler operation (monitor withredis-cli MONITOR)populate_time_index=Truemigrates existing time keys into the sorted setadd_schedule/delete_scheduleat the same minute does not lose index entries